Arrow keys / Space to navigate

Module 8: Step Functions for Orchestration

Developing Serverless Solutions on AWS

Topics

Why Step Functions?

Analogy: Recipe card. Step Functions is like a recipe that tells the kitchen (Lambda) what to do step by step. The recipe handles the flow (order, wait, branch), so each cook focuses only on their one task.

States Get Inputs, Do Work, Pass Outputs

Start FirstState Type: Task SecondState Type: Pass End State input State output Each state receives input, performs work, and passes output to the next state.

State Types

StatePurposeAnalogy
TaskPerforms work (Lambda, service API, activity)A cook making one dish
ChoiceBranching logic (if/else based on input)Fork in the road - which way?
ParallelRuns branches concurrently, waits for allMultiple cooks working at once
MapIterates over array items (same steps per item)Assembly line - same process per unit
WaitPauses for specific/relative timeTimer on the oven
PassPasses input to output (or injects data)Passing the baton in a relay
Succeed/FailTerminates workflow with success or failureFinish line / stop sign

Task States - Perform the Work

A Task state performs work through:

Integration Patterns

PatternSuffixBehavior
Request-Response(none)Call and immediately continue
Run a Job (.sync).syncWait until job completes
Callback.waitForTaskTokenPause until token returned externally
.sync = oven timer: you put the pie in and WAIT until it's done. Callback = delivery: you order and WAIT until someone knocks on your door with the result.

Choice & Parallel States

Choice State (Branching) Pie type? Peel apples Pit cherries apple cherry Parallel State (Concurrent) Prepare dough Peel apples Heat oven All done!

Choice = if/else routing. Parallel = run all at once, wait for all to finish.

Map State - Iterate Over Array

Assembly line: Same steps for each donut in the order. Map runs the Iterator once per item in the input array - concurrently!
// Input to Map state
{
  "donutorder": "C16",
  "ordered": [
    {"icing": "sprinkles", "quantity": 4},
    {"icing": "chocolate", "quantity": 4},
    {"icing": "lemon", "quantity": 4}
  ]
}
// Map runs: Fry -> Ice -> Box for EACH item in "ordered" array

Map Types

Wait States & Callback Patterns

PatternHowUse Case
Wait statePause for X seconds or until timestampRate limiting, scheduled delays
.syncSuffix on Task - waits for job completionECS tasks, Glue jobs, Batch jobs
.waitForTaskTokenPause until external system returns tokenHuman approval, external API callback
Wait = set a kitchen timer. .sync = put pie in oven, wait until done. Callback = order delivery and wait for doorbell.

Error Handling: Retry & Catch

"Retry": [{
  "ErrorEquals": ["States.TaskFailed"],
  "IntervalSeconds": 3,
  "MaxAttempts": 3,
  "BackoffRate": 2.0
}],
"Catch": [{
  "ErrorEquals": ["States.ALL"],
  "Next": "HandleErrorState"
}]
Retry = try baking again if it didn't rise. Catch = if all attempts fail, throw away the batch and alert the manager.

Filtering Inputs & Outputs

Data flow through a state InputPathFilter input ParametersReshape/enrich Task / WorkerDoes work ResultSelectorFilter results ResultPathMerge w/ input OutputPathFinal filter InputPath -> Parameters -> Task -> ResultSelector -> ResultPath -> OutputPath

Standard vs Express Workflows

FeatureStandardExpress
DurationUp to 1 yearUp to 5 minutes
Execution modelExactly-onceAt-least-once
PricingPer state transitionPer execution + duration + memory
Execution historyFull (console + API)CloudWatch Logs only
Throughput2,000 starts/sec100,000+ starts/sec
Best forLong-running, auditable workflowsHigh-volume, short, event processing
Standard = formal postal mail (tracked, guaranteed delivery, takes time). Express = text message (fast, high volume, no guaranteed receipt tracking).

Service Integrations

Deployment & Monitoring

Step Functions = universal remote control for AWS services. One state machine can orchestrate DynamoDB + Lambda + SQS + Bedrock without writing glue code.

What's New (2024-2025)

Q1: Which state type provides if/else branching logic?

C) Choice - Uses rules to determine the next state based on input values.
A: Task does work. B: Parallel runs branches concurrently. D: Map iterates over arrays.

Q2: How do you pause Step Functions until an external system completes?

B) .waitForTaskToken - Pauses execution and passes a token. External system calls SendTaskSuccess/Failure with that token to resume.
A: Wait is for fixed delays, not external completion. C: Wastes Lambda execution time. D: Parallel runs things concurrently, doesn't wait for external.

Q3: When should you use Express over Standard workflows?

B) High-volume, short event processing - Express handles 100K+ starts/sec, runs up to 5 min, costs less per execution.
A: Express max 5 min; use Standard for hours. C: Express is at-least-once; Standard is exactly-once. D: Express logs to CloudWatch only, no console history.

Q4: What does ResultPath do in the input/output processing?

B) Combines original input with task result - Specifies WHERE in the original input to insert the result. Preserves context.
A: That's InputPath. C: That's OutputPath. D: That's Parameters.

Module Summary

Live Demo: Order Processing Workflow

This demo creates a Step Functions state machine that demonstrates: Task, Choice, Parallel, Wait, and error handling.

Architecture

Start Validate Order Priority? Express Ship Standard Ship Parallel:Notify + Update DB Wait 2s(simulate) Complete End

Demo Step 1: Create Lambda Functions

1. Create IAM Role

aws iam create-role --role-name sfn-demo-lambda-role \
  --assume-role-policy-document '{"Version":"2012-10-17","Statement":[{"Effect":"Allow","Principal":{"Service":"lambda.amazonaws.com"},"Action":"sts:AssumeRole"}]}'

aws iam attach-role-policy --role-name sfn-demo-lambda-role \
  --policy-arn arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole

2. Create validate_order.py, notify_customer.py, update_inventory.py

# validate_order.py
import json
def handler(event, context):
    if not event.get("orderId") or not event.get("item"):
        raise Exception("InvalidOrder: Missing orderId or item")
    event["validated"] = True
    event["total"] = event.get("quantity", 1) * event.get("price", 10)
    return event

3. Package & Deploy Each Function

# Package each function
zip validate_order.zip validate_order.py
zip notify_customer.zip notify_customer.py
zip update_inventory.zip update_inventory.py

# Create functions (wait 10s after role creation for propagation)
aws lambda create-function --function-name sfn-validate-order \
  --runtime python3.12 --handler validate_order.handler \
  --role arn:aws:iam::ACCOUNT:role/sfn-demo-lambda-role \
  --zip-file fileb://validate_order.zip

aws lambda create-function --function-name sfn-notify-customer \
  --runtime python3.12 --handler notify_customer.handler \
  --role arn:aws:iam::ACCOUNT:role/sfn-demo-lambda-role \
  --zip-file fileb://notify_customer.zip

aws lambda create-function --function-name sfn-update-inventory \
  --runtime python3.12 --handler update_inventory.handler \
  --role arn:aws:iam::ACCOUNT:role/sfn-demo-lambda-role \
  --zip-file fileb://update_inventory.zip

Demo Step 2: Create State Machine

1. Create Step Functions IAM Role

aws iam create-role --role-name sfn-demo-execution-role \
  --assume-role-policy-document '{"Version":"2012-10-17","Statement":[{"Effect":"Allow","Principal":{"Service":"states.amazonaws.com"},"Action":"sts:AssumeRole"}]}'

aws iam put-role-policy --role-name sfn-demo-execution-role \
  --policy-name InvokeLambda --policy-document '{
  "Version":"2012-10-17","Statement":[{"Effect":"Allow",
  "Action":"lambda:InvokeFunction",
  "Resource":"arn:aws:lambda:us-west-2:ACCOUNT:function:sfn-*"}]}'

2. Save state-machine.json (next slide has the full definition)

# Save the JSON definition to a file, then create:
aws stepfunctions create-state-machine \
  --name order-processing-demo \
  --definition file://state-machine.json \
  --role-arn arn:aws:iam::ACCOUNT:role/sfn-demo-execution-role \
  --type STANDARD

# Note the state machine ARN from the output

3. Verify in Console

# Open Step Functions console to see the visual workflow:
# https://console.aws.amazon.com/states/home?region=us-west-2#/statemachines

Demo Step 2: State Machine JSON (state-machine.json)

{
  "Comment": "Order Processing - Task, Choice, Parallel, Wait, Error Handling",
  "StartAt": "ValidateOrder",
  "States": {
    "ValidateOrder": {
      "Type": "Task",
      "Resource": "arn:aws:lambda:us-west-2:ACCOUNT:function:sfn-validate-order",
      "Retry": [{"ErrorEquals": ["States.TaskFailed"], "MaxAttempts": 2,
                 "IntervalSeconds": 2, "BackoffRate": 2.0}],
      "Catch": [{"ErrorEquals": ["States.ALL"], "Next": "OrderFailed"}],
      "Next": "CheckPriority"
    },
    "CheckPriority": {
      "Type": "Choice",
      "Choices": [
        {"Variable": "$.priority", "StringEquals": "express", "Next": "ExpressShip"},
        {"Variable": "$.total", "NumericGreaterThan": 100, "Next": "ExpressShip"}
      ],
      "Default": "StandardShip"
    },
    "ExpressShip": {"Type":"Pass","Result":"express","ResultPath":"$.shipMethod","Next":"NotifyAndUpdate"},
    "StandardShip": {"Type":"Pass","Result":"standard","ResultPath":"$.shipMethod","Next":"NotifyAndUpdate"},
    "NotifyAndUpdate": {
      "Type": "Parallel",
      "Branches": [
        {"StartAt":"Notify","States":{"Notify":{"Type":"Task","Resource":"arn:aws:lambda:us-west-2:ACCOUNT:function:sfn-notify-customer","End":true}}},
        {"StartAt":"UpdateDB","States":{"UpdateDB":{"Type":"Task","Resource":"arn:aws:lambda:us-west-2:ACCOUNT:function:sfn-update-inventory","End":true}}}
      ],
      "Next": "WaitForProcessing"
    },
    "WaitForProcessing": {"Type": "Wait", "Seconds": 2, "Next": "OrderComplete"},
    "OrderComplete": {"Type": "Succeed"},
    "OrderFailed": {"Type":"Fail","Cause":"Validation failed","Error":"InvalidOrder"}
  }
}

Demo Step 3: Deploy & Test

Create via AWS CLI

# Create state machine
aws stepfunctions create-state-machine \
  --name order-processing-demo \
  --definition file://state-machine.json \
  --role-arn arn:aws:iam::ACCOUNT:role/StepFunctionsRole \
  --type STANDARD

Test: Successful Order (Choice = express)

aws stepfunctions start-execution \
  --state-machine-arn arn:aws:states:us-west-2:ACCOUNT:stateMachine:order-processing-demo \
  --input '{"orderId":"ORD-001","item":"Laptop","quantity":1,"price":999,"priority":"express"}'

Test: Standard Order (Choice = default)

aws stepfunctions start-execution \
  --state-machine-arn arn:aws:states:us-west-2:ACCOUNT:stateMachine:order-processing-demo \
  --input '{"orderId":"ORD-002","item":"Mouse","quantity":2,"price":25}'

Test: Failed Validation (triggers Catch)

aws stepfunctions start-execution \
  --state-machine-arn arn:aws:states:us-west-2:ACCOUNT:stateMachine:order-processing-demo \
  --input '{"quantity":1}'
# Missing orderId and item - triggers error -> OrderFailed state

Demo: What to Show in Console

Concepts Demonstrated

State TypeWhere in Demo
TaskValidateOrder, Notify, UpdateDB (Lambda invocations)
ChoiceCheckPriority (routes express vs standard)
ParallelNotifyAndUpdate (email + inventory simultaneously)
WaitWaitForProcessing (2-second delay)
PassExpressShipping/StandardShipping (inject data)
Retry/CatchValidateOrder retries + catches to OrderFailed